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ABSTRACT 

A cell division cycle is a well-coordinated process in 
eukaryotes with cell cycle genes exhibiting a 
periodic expression over time. There is considerable 
interest among cell biologists to determine 
genes that are periodic in multiple organisms and 
whether such genes are also evolutionarily 
conserved in their relative order of time to peak 
expression. Interestingly, periodicity is not well- 
conserved evolutionarily. A conservative estimate 
of a number of periodic genes common to fission 
yeast (Schizosaccharomyces pombe) and budding 
yeast (Saccharomyces cerevisiae) ('core set FB') is 
35, while those common to fission yeast and 
humans (Homo sapiens) ('core set FH') is 24. Using 
a novel statistical methodology, we discover that 
the relative order of peak expression is conserved 
in -80% of FB genes and in ~40% of FH genes. We 
also discover that the order is evolutionarily 
conserved in six genes which are potentially the 
core set of signature cell cycle genes. These 
include ace2 (a transcription factor) and polo-kinase 
plol, which are well-known hubs of early M-phase 
clusters, cdc18 a key component of pre-replication 
complexes, mik1 which is critical for the establish- 
ment and maintenance of DNA damage check point, 
and histones hhfl and hta2. 

INTRODUCTION 

A cell division cycle among eukaryotes consists of a 
sequence of four major phases, namely, the Gl, S, G2 
and the M phase. The Gl phase (also known as the Gap 
1 phase) is a resting phase where the cells grow in size and 
prepare for synthesis during the S phase. Furthermore, Gl 



phase also serves as one of two major check points, where 
if DNA damage is detected then the cell is prevented from 
proceeding to the S phase (1,2). Cells which pass the Gl 
check point proceed to S phase where the DNA replica- 
tion takes place. This phase is followed by the G2 or Gap 
2 phase. Similar to Gl, this phase serves as a check point 
to ensure cells with damaged DNA do not proceed to the 
M phase (or mitosis) where the cells divide to form two 
daughter cells. 

Genes participating in a cell division cycle have a 
cyclical pattern of expression with peak attained just 
before their function (3). There are several intrinsic 
differences among organisms in various aspects of cell 
division cycle. For instance, the amount of time spent 
by a cell in different phases varies. Fission yeast, 
Schizosaccharomyces pombe (S. pombe), cell spends 
almost 70% of its time in the G2 phase, whereas the 
budding yeast, Saccharomyces cerevisiae (S. cerevisiae), 
cell spends roughly quarter of its time in G2 phase. 
Arbidopsis thaliana has a relatively small G2 phase but a 
long S phase. In contrast to S. pombe, a human cell may 
spend substantially more time in the Gl phase than in G2 
phase (See www.cyclebase.org). Also, the proportion of 
genes that are known to participate in cell division cycle 
varies with organisms. For example, it is estimated that 
there are twice as many periodic genes in 5. cerevisiae as in 
S. pombe (4). 

Despite many such differences, researchers are inter- 
ested in (i) identifying genes that are periodic in multiple 
organisms (referred to as 'periodically conserved' genes) 
(Figure 1); (ii) among periodically conserved genes, iden- 
tifying those that are also conserved in their phase of peak 
expression (Figure 1). This has been an active area of 
research over the past several decades [cf. (3,5)]. With 
the advent of microarray technology, numerous micro- 
array studies have been conducted on several model or- 
ganisms such as S. cerevisiae (6-9), S. pombe (4,10-12), 
Homo sapiens (13) and Arabidopsis (14,15). Such 
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large-scale genome wide data on multiple organisms 
provide an excellent opportunity to determine genes 
involved in cell cycle and study their functions. It also 
allows one to understand the similarities and 
dissimilarities in the cell cycle of various organisms. A 
useful database containing results from various cell cycle 
microarray experiments is available at www.cyclebase.org 
(16), henceforth referred as 'cyclebase'. These microarray 
data allow biologists to debate the conservation of genes 
participating in a cell cycle and their times of peak expres- 
sion (3,4,10,11, 12,17). Based on a comprehensive analysis 
of the above microarray data and other published data, 
Jensen et al. (3) concluded that both periodicity as well as 
phase of peak expression are evolutionarily poorly 
conserved. Earlier a similar conclusion was drawn by 
Rustici et al. (4) who concluded that only 40 or so 
orthologs are periodic in both species of yeasts. 

Although the poor conservation of periodicity and the 
phase of peak expression for most cell cycle genes may be 
biologically plausible, as evolutionarily functions of some 
genes may have changed, one cannot ignore variability 
between and within studies that may have contributed to 
these findings. For instance, even within the same 
organism there are major differences among studies pub- 
lished in the literature. Recently, three different groups of 
researchers conducted a total of 10 microarray experi- 
ments on 5. pombe [5 by Rustici et al. (4), 3 by Oliva 
et al. (11) and 2 by Peng et al. (12)]. As summarized by 
(11) and by Caretta-Cartozo et al. (17), the three studies 
disagreed considerably on the number of periodic genes. 
According to Caretta-Cartozo et al. (17), only 156 out of 
~5000 genes in S. pombe genome were declared to be 
periodic by all three studies, although individually each 
group identified at least three times as many periodic 
genes. Furthermore, even among genes that were found 
to be periodic in at least two of the three studies, 
there were disagreements among the studies in terms of 
time to peak expression of some genes. For instance, 
according to Peng et al. (12) cdcl8, mobl, imp2 and cig2 
peak during the Gl phase, whereas Oliva et al. (11), 
Rustici et al. (4) and cyclebase suggest that these are 
M phase genes. 

The above discrepancies among studies even in the same 
organism are not surprising and can be attributed to 



various factors such as, natural variability in the data, 
experimental conditions, etc. Factors such as these result 
in statistical variability and uncertainty in estimates of 
time to peak expression. Not much has been discussed in 
the literature regarding these issues. The problem becomes 
even more challenging when comparisons across multiple 
organisms are to be made. In such comparisons, the bio- 
logical differences among organisms may be confounded 
by statistical uncertainties due to variability in the data. 
For example, Rustici et al. (4) concluded that cell cycle 
regulation of majority of genes is not conserved between 
S. pombe and 5. cerevisiae. On the other hand, Oliva et al. 
(11) and Peng et al. (12) suggest greater amount of 
similarities between the two species of yeasts and infer 
conservation of regulatory mechanisms. 

Since cell division cycle is a carefully orchestrated 
process, it is reasonable to hypothesize that the relative 
order of peak expression among a core set of cell cycle 
genes may be conserved even if the phase of some genes 
may have been evolutionarily modified (Figure 2). Genes 
whose relative order of time to peak expression is 
conserved may not only have a well-defined function in 
cell division cycle, but may also have potential interactions 
or associations with each other. In this article, we develop 
a formal statistical methodology to test the hypothesis 
that the relative order of peak expression among a core 
set of cell cycle genes is conserved in a pair of organisms. 
Using this methodology, we investigate if the 'core set FB' 
of fission yeast genes have the same relative order of peak 
expression as their budding yeast orthologs. Similarly, we 
investigate if the 'core set FH' of fission yeast genes have 
the same relative order of peak expression as their human 
orthologs. The statistical procedure developed in this 
article is novel and could help biologists formulate and 
test other similar hypotheses. 

MATERIALS AND METHODS 

Relative order of peak expression among cell cycle genes 

For a gene D, the time to its peak expression can be 
described in terms of an angle on a unit circle, known 
as the phase angle, which is denoted by § D . Suppose 
D, E and F are three cell cycle genes where D is an S 
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Figure 2. (I) Relative order of peak expression of genes A,B and C is not conserved in Species 1 and 2. (II) Relative order of peak expression of 
genes D, E and F is conserved in Species 1 and 2. Orthologs in Species 2 are denoted by lower case letters a, b, c, d, e and f. In each panel, the 
vertical hash mark on the time axis represents the boundary of a phase. Insets in each panel represent the time to peak expression in terms of phase 
of cell cycle. 




phase gene, E is an early G2 phase gene and Fis a mid-G2 
phase gene, then they satisfy the relative order; D followed 
by E which is followed by F and which is followed by D. 
We represent this relative order of peak expression among 
the three genes by § D < <\>e< <\>f < 4>.d ( or equivalently, D 
< E < F < D). Suppose S. cerevisiae genes D, E and F are 
the orthologs of S. pombe genes d, e and /, respectively. 
Suppose D < E < F < D, then we say that the relative 
order is conserved among the orthologs if d < e < f < d 
(see Figure 2). In the ideal setting, if functions of all genes 
in the core set are conserved through evolution and if cell 
cycle is a well-ordered mechanism of nature, then it is 
reasonable to hypothesize that the relative order of expres- 
sion of genes in the core set is conserved between the two 
organisms. Note that the relative order is invariant of the 
location of the pole of the circle. This is important for 
several reasons. First, biologically, the order of genes 
around the circle has no bearing on where the pole of 
the circle is established (i.e. it is rotation invariant). 
Secondly, a common challenge with time course cell 
cycle experiments is that one cannot be sure about the 
exact biological time when the cells were arrested to 
define the pole precisely. Also, different labs and experi- 
ments may arrest cells during different phases of cell cycle. 
Consequently, it can be challenging to compare phase 
angles across experiments since each experiment may 
have a different pole. However, the relative order of 



genes should be invariant of the location of the pole. 
Also, it is important to note that our definition of conser- 
vation of relative order does not require the orthologs 
pairs (D, d), (E, e), (F, /), etc. to have same phase 
angles or even the same phases (see the right panels in 
Figure 2). We just require d, e, f to satisfy the same 
relative order as D, E, F. 

Using Rustici et al. (4), Oliva et al. (11) and cyclebase, 
we arrived at the core set FB of 35 5. pombe cell cycle genes 
that are periodic in both yeasts (Table 1). Similarly, using 
cyclebase we arrived at the core set FH of 24 5. pombe cell 
cycle genes that are periodic in both S. pombe as well as 
H. sapiens (Table 2). We limited our core sets to include 
only those genes whose cycelbase periodicity rank is <500. 
The rank cut-off of 500 was arbitrarily chosen. Our point 
is that genes with higher ranks are less likely to be periodic 
with estimated phase angles subject to small concentration 
parameter, resulting in large uncertainty estimates. 

In the case of human orthologs, we relied completely on 
the peaktime specified by the cyclebase database to arrive 
their relative order (Table 2). However, in the case of 
budding yeast orthologs, in addition to cyclebase, we 
used published literature and Saccharomyces Genome 
Database (http://www.yeastgenome.org/) to arrive at the 
relative order (Table 1). 

We now describe the relative order of the 
35 S. cerevisiae orthologs. Since the mRNA level as well 
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Table 1. Saccharomyces pombe genes in the core set FB arranged according to the relative order of S. cerevisiae orthologs 
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as its protein level peaks during the early stages of Gl 
phase and is the precursor for DNA synthesis, therefore 
we begin with CDC6. This gene is followed by several Gl 
phase genes such as those involved in DNA repair, rep- 
lication and check point (Replication Factor Alpha, 
RNR1, MSH6, MRC1 and POL1), cohesion of sister 
chromatids (SMC3 and MCD1), recombinational repair 
of double-strand breaks in DNA (RAD51), activation of 
Cdc28p to promote transition from Gl to S phase 
(CLN2, a late Gl phase cyclin) and DNA synthesis 
during DNA repair (POL2). The S phase genes that 
followed the Gl phase genes are those involved in: regu- 
lation of G2/M transition by inhibition of Cdc28p kinase 
activity (SWE1), chromatin assembly (histones such as 
HHT2, HHF1, HHT1, HTA2, HTB2, HTZ1) and 
mitotic spindle position (K1P3). These are followed by 
G2 phase transcription factors such as FKH1 and 
SW15. Several G2 phase genes considered here have 
proteins involved in important functions such as: bud 
site selection (BUD4), cytokinesis and septation (CDC5, 
MOB1, ASE1, MYOl, CHS2, HOF1). Note that we con- 
sidered both 5. pombe orthologs of HOF1, namely, cdcl5 
and imp2 in our analysis. These genes are followed by 



KIN 3, G2/M check point gene whose protein Kin3 
plays a critical role in DNA damage recognition before 
the cell enters M phase (18). Among the M phase genes in 
the proposed relative order, DBF2 and CDC20 have a 
function for cells to exit from mitosis, while PST1 has 
function in the construction of cell wall. Our proposed 
relative order concluded with DSE4, a daughter 
cell-specific protein which degrades the cell wall causing 
the daughter cell to separate from the mother cell. Hence, 
it is logical that DSE4 was the last gene in our proposed 
relative order before returning to CDC6. According to 
cyclebase, some of the S. cerevisiae orthologs in the 
core set FB have identical peaktimes hence we assigned 
identical phase angles to all such genes in the null hy- 
pothesis. Specifically, following genes within parenthesis 
were hypothesized to have the same phase angles: (cdc22, 
msh6, mrcl, poll, psm3), (rad21, rhp51), (cig2, poll), 
(ace2, mid2, plol), (mcpl, myo3, chs2) and (cdcl5, 
imp2). The resulting relative order is described in Figure 
3, where genes that were hypothesized to have same 
phase angle are along the same ray from the center of 
the circle (ray not drawn). See also Table 1. Known bio- 
logical functions of genes in the core set FB are provided 
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Table 2. Saccharomyces pombe genes arranged in the core set FH according to the relative order of H. sapiens orthologs 
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Figure 3. Saccharomyces pombe genes arranged according to the relative order and approximate locations of their 5. cerevisiae orthologs (in 
parenthesis). Sectors drawn are according to S. pombe cell cycle. 



ill Supplementary Table SI. The goal of this study is to 
test whether the 5". pombe genes satisfy the relative order 
specified by the S. cerevisiae orthologs. Thus we tested 
the null hypothesis that the phase angle of cdcl8 is 
followed by the phase angle of ssbl,...,SPAC1705.03c 
which in turn is followed by the phase angle of engl 
which in turn is followed by the phase angle of cdc!8 



against the alternative hypothesis that this order is not 
true. More precisely: 

Hq : <Pcdc\& -< <Pssb\ -< <Pcdc22 = 4>msh(> = ■ ■ ■ 

— <Ppsm3 -<...-< </'S/'/lC1705.03c "< 4>engl "< <l>cdcl8 (1) 

Hi : Ho is not true 
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Similarly, we tested the following hypotheses to see 
whether the S. pombe genes satisfy the relative order 
specified by the 24 H. sapiens orthologs: 

#0 '■ <Pssn6 -< 0.5*22 "< <Pace2 < ■ ■ ■ < 4>plo\ = ■ • ■ = <t>cigl 

< 4>bir\ = <f>slp 1 < <P,acm < 4>smb (2) 

H\ : Hq is not true 

There are two reasons (biological and statistical) for the 
above formulation of null and alternative hypotheses. 
First, as stated earlier, the cell division cycle is a funda- 
mental process in eukaryotes and one would expect 
various aspects of this process to be conserved through 
evolution. This is the basic premise of many recent 
papers [e.g. (4)] which tried to identify genes that are 
periodic in multiple organisms. Since our investigation is 
based on such conserved genes, the conservation of the 
relative order should be the null hypothesis rather than 
the alternative hypothesis. Basically, among genes that 
are declared to be conserved between species, our null 
hypothesis states that their relative order is also conserved. 
There is also a statistical reason for our choice of null and 
alternative hypotheses. If the null hypothesis was that the 
relative order is not conserved among the q genes in the 
two organisms, then the null hypotheses would contain 
(q — 1)! configurations of parameters and the alternative 
hypothesis contains only one configuration. As q in- 
creases, the null hypothesis is too large and is never 
likely to be rejected. For example, if q = 35 the total 
number of possible null configurations are of the order 
10 40 , which is extremely large. No statistical test would 
have sufficient power to reject the null hypothesis in 
such situations. In fact, the power will go to zero as q 
increases! 

Statistical test 

For each gene i, i= l,2,...,q and experiment j, j = 1 , 
2,...,E, we model the unconstrained estimator of phase 
angle of peak expression 6, ; , obtained from the Random 
Period Model (RPM) (19), using the von Mises distribu- 
tion (VM). This distribution plays an important role 
in circular data analysis similar to the normal distribution 
for Euclidean data. Thus, we assume that 9y~~>VM(<pij, /c/) 
where faj is the true unknown phase angle of peak expres- 
sion of gene i in the j-th experiment. The concentration 
parameter k, represents the uncertainty associated with 6;,-. 
We assume that Kj depends on experiment j but not 
on gene i. There are two sources of uncertainty associated 
with the phase angle estimate of each gene, one is specific 
to the gene and the other is due to the experiment (which 
is common to all genes within the experiment). This re- 
sembles the classical mixed effects linear model in 
Euclidean space data. Since the number of time points 
used in each of the time course experiments considered 
in this article is fairly large, for any specific gene, the un- 
certainty associated with the estimator of the phase 
angle based on the RPM is negligible relative to the un- 
certainty due to the experiment. For this reason, we only 
retained the uncertainty component corresponding to the 
experiment. 



For each experiment j, j = 1 , 2,...,E, our problem of 
interest is to the test the following hypotheses: 

Hq :The phase angles (j>y, i = 1, q follow a 

known relative order (3) 
H\ :Hq is not true. 

Let C 1 — { x e iW 7 : 0 < xj < . . < x q <x\<..< x/_ i <2tt} be 
a simple order constraint starting at index I and let 
4>j = ((/>!,, faj, • • • , <t> q ji ■ Then, for each j = 1, 2, . . ., E, the 
above null hypothesis can be rewritten as cpj e C q , where 
C q = U Ci. 

l</<<7 I 

Let 6j = (6\j, 9y, . . . , 6 q j)' denote the restricted maximum 
likelihood estimator of <t>y subject to the constraint fye C q 
(20). 9j determines a partition /= {1, ... ,7} into sets 
of consecutive coordinates on which Gj is constant. These 
sets are called level sets. Then, we construct the following 
test statistic to test the above hypotheses 

q 

?} = 2/c^(l-cos(%-%)). 

/=i 

Notice that 7) is a measure of the angular dis- 
tance between 6 ; and 6j. Our conditional test 7}, 
described in detail in the Supplementary Data, rejects 
the null hypothesis whenever Tj>c(m), where m is 
the number of level sets for 9j, c(m) is chosen so that 
pr(xl- m > c(mj) = a/(l - pr^iCq)), and / is any point 
in the null hypotheses for which all coordinates are 
equal. Since in practice, k ; are unknown, in order to 
derive a proper value for these parameters, we obtained 
its maximum likelihood estimator using an analysis of 
variance approach based on the von Mises distribution 
(see Section 1.3 in Supplementary Data). 

Theoretical details of 7} are given in Section 1.4 of 
Supplementary Data. There we demonstrate that our 
proposed test is an asymptotic a level test. The derivation 
of this latter property is not straightforward as several 
statistical issues arise as a result of the specific character- 
istic of the testing problem, namely, a complicated null 
hypothesis for a directional parametric model. 

Lack of fit criterion for a given relative order 

For a relative order specified by the null hypothesis, 
let pj denote the /"-value associated with the y'-th 
experiment, j = 1, 2, . . . ,E, and let L = — J2/=i ^°S(Pj)- 
Note that L always lies between 0 and oo, with smaller 
value corresponding to better fit. In the extreme case, if 
the presumed relative order is perfectly satisfied within 
each experiment with /"-value of 1, then L = 0. Thus, 
among a collection of plausible orders for a set of cell 
cycle genes, a biologist may choose the order that corres- 
ponds to the smallest value of L. Note that under the null 
hypothesis, if the P-values are independently and uniform- 
ly distributed in the interval (0, 1), then 2L is distributed as 
a central x 2 random variable with E degrees of freedom. 
This is often known as Fisher's method of combining 
P-values and yields a formal statistical test. 
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RESULTS 

Using the 10 5. pombe time course experimental data 
(4,11,12), we first obtained the unconstrained phase 
angle estimates of genes in the core sets FB and FH 
(Supplementary Tables S2 and S3) which are then used 
for testing various hypotheses described in this article. 
Using the estimates in Supplementary Table S2, we 
tested the hypotheses appearing in Equation (1) that all 
35 S. pombe genes in FB satisfy the relative order specified 
by the S. cerevisiae orthologs against the alternative hy- 
pothesis that they are not. The null hypothesis is rejected 
at P<0A5 in 5 out of 10 experiments (Table 3). Of these 
five experiments, two have a P< 0.0001. If the null hy- 
pothesis was true in each of the 10 experiments, then the 
binomial probability of observing two or more experi- 
ments (out of 10) with a P = 0.0001 is 4.49 x 10" 7 , 
which is extremely small. This suggests that the relative 
order hypothesized in Equation (1) may not be true and 
thus the 35 S. pombe genes do not follow the same relative 
order as their 5". cerevisiae orthologs. Of course, in the 
above argument we implicitly assume that the outcomes 
of the 10 experiments are identically and independently 
distributed. Although this is a commonly made assump- 
tion, we acknowledge that it may be restrictive. 

A question of interest is whether we can identify a 
subset of the 35 genes that conserve the relative order 
between the two yeasts. Since the number of all possible 
subsets (of various sizes) is extremely large, it would be 
practically impossible to enumerate all possible subsets 
of all sizes and then test the null hypotheses such as the 
one appearing in Equation (1) for each subset. This 
problem resembles the classical problem of selection of 
variables (or model selection) in linear regression 
analysis. Accordingly, we developed a Forward Selection 
Algorithm (FSA), which is described in the 
Supplementary Data. Similar to forward selection proced- 
ure in classical linear regression analysis, the FSA 
proceeds systematically by entering one gene at a time 
into the test for relative order according to its periodicity 
rank assigned by the cyclebase. Smaller the rank, the more 



Table 3. Test for relative order of S. pombe genes in the core set FB 
(Order specified by S. cerevisiae orthologs) 



Experiment P-values 





Based on all 35 


Based on final 28 




Saccharomyces 


Saccharomyces 




pombe genes 


pombe genes 


Oliva cdc 


0.08 


0.53 


Oliva elutl 


0.69 


0.98 


Oliva elut2 


0.14 


0.41 


Peng cdc 


0.24 


0.99 


Peng elut 


0.57 


0.99 


Rust cdcl 


5.37E-11 


0.34 


Rust cdc2 


0.19 


0.86 


Rust elutl 


0.07 


0.88 


Rust elut2 


0.24 


0.99 


Rust elut3 


2.88E-05 


0.53 


Lack of fit 


46.75 


3.57 



periodic the gene is and hence its phase angle estimate is 
more likely to be reliable. The proposed FSA begins with 
all ortholog pairs that have a cyclebase rank < 100. Thus, a 
gene is included in Step 1 of FSA if both fission yeast as 
well as the budding yeast orthologs of the gene has a rank 
< 100. Details of the subsequent steps and the implemen- 
tation of FSA are provided in the Supplementary Data. 

Using FSA (Supplementary Table S4), we discover that 
28 out of 35 5. pombe genes, namely, cdc 18, ssbl, cdc22, 
msli6, mrcl , poll , psm3, rad21, cig2,pol2, mikl, h3.3, hhfl, 
hht3, hta2, htbl, phtl, klp5, fkh2, ace2, plol, chs2, cdcl5, 
imp2, sid2, slpl, SPAC1705.03C, engl, potentially satisfy 
the same order as their S. cerevisiae orthologs. Thus, the 
relative order of these 28 genes seems to be conserved 
between the two species of yeasts. For these genes, the 
null hypothesis is rejected in none of the experiments 
even at a level of significance as high as 0.30 (Table 3). 
It is also interesting to note that the lack of fit criterion L 
based on all 35 genes was 46.75 and it dropped to 3.57 for 
the above 28 genes selected by FSA. 

Similar to genes in FB, we also tested the Equation (2) 
for genes in the core set FH and found that the relative 
order was rejected in 6 out of 10 experiments at a 
P< 0.001 (Table 4). Using FSA we found ace2, cdcl8, 
mikl, histories {hhfl, htal), rncl, lop2, cdc25, plol and 
slpl, to satisfy the same relative order as their human 
orthologs (Table 4). Among these 10 genes, ace2, cdcl 8, 
mikl, histories {hhfl, htal) and plol also satisfied the 
relative order specified by their 5. cerevisiae orthologs. 
Recall that, evolutionarily, humans and fungi are ~1.5 
billion years apart and budding yeast and fission yeasts 
are nearly billion years apart (21). Thus, it appears that 
the above six genes are evolutionarily conserved in their 
relative order of peak expression during the cell division 
cycle (Figure 4 and Table 5). These six genes are well 
known in the literature to play a critical role during cell 
division cycle. For example, the transcription factor ace2 
and the polo-kinase plol are well-known hubs of early M 
phase clusters (22), the cell cycle gene cdcl8 is a key com- 
ponent of pre-replication complexes for the onset of S 
phase (23), histones hhfl, hta2 play an important role 



Table 4. Test for relative order of S. pombe genes in the core set FH 
in the 10 experiments (Order specified by H. sapiens orthologs) 



Experiment ^-values 





Based on all 24 


Based on final 10 




Saccharomyces 


Saccharomyces 




pombe genes 


pombe genes 


Oliva cdc 


0.03 


0.60 


Oliva elutl 


0.10 


0.93 


Oliva elut2 


0.19 


0.95 


Peng cdc 


1.10E-03 


0.95 


Peng elut 


0.07 


0.83 


Rust cdcl 


4.12E-10 


1 


Rust cdc2 


2.37E-06 


0.93 


Rust elutl 


0 


0.92 


Rust elut2 


1.90E-13 


0.93 


Rust elut3 


0.06 


1 


Lack of fit 


> 100 000 


1.10 
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Figure 4. A core set of signature cell cycle genes with relative order of 
time to peak expression conserved among S. pombe, S. cerevisae and 
H. sapiens. Sectors and approximate locations of genes are drawn 
according to S. pombe. S. pombe genes are in green, S. cerevisiae 
orthologs are in red and H. sapiens ortliologs are in blue. 



Table 5. A core set of signature cell cycle g 


enes 




Saccharomyces pombe gene 


Saccharomyces 


Saccharomyces 


Homo 


(Saccharomyces cerevisiae, 


pombe 


cerevisiae 


sapiens 


Homo sapiens ortliologs) 








plol (CDC5, PLK1) 


G2 


G2 


M 


acel (SWI5, ZNF367) 


G2/M 


G2 


Gl 


cdcl8 (CDC6, CDC6) 


M 


M 


Gl/S 


mikl (SWE1, PKMYT1) 


M 


Gl/S 


S 


hhfl (HHF1, HIST2H4B) 


Gl/S 


S 


S 


hta2 (HTA2, H2AFX) 


Gl/S 


S 


S/G2 



Cell cycle phases are obtained from cyclebase 



during the S phase and mikl is critical in the establishment 
and maintenance of DNA damage check point (24). 

To ensure that our statistical test has sufficient power to 
detect the alternative hypothesis, i.e. reject the null hy- 
pothesis that the genes in both species satisfy the same 
relative order, we conducted a simulation study for the 
fission and budding yeast data by randomly permuting 
the order of the genes in Step 1 of FSA and applied the 
algorithm. We considered 100 permutations and per- 
formed the first step of FSA on each permuted data. 
The null was rejected for all 100 permutations. We also 
found that in at least 5 out of the 10 experiments the P 
<0.05 and this occurred in every one of the 100 random 
permutations we considered. Note that the binomial prob- 
ability of observing a _P-value of 0.05 in at least 5 experi- 
ments out of 10 experiments by random chance is 
6.36 x 10~ 5 , which is a very unlikely event. Yet, in all 
100 random permutations we found 5 out of 10 experi- 
ments to have a P <0.05, thus suggesting that our test is 
reasonably powerful to reject the null hypothesis of 
relative order if the hypothesis is not true. In our simula- 
tion study, we did not investigate the power of our test for 



alternatives where the order among the genes is not well 
conserved but not entirely random order. As with any 
statistical test, there will be a reduction in power as we 
get closer to the null hypothesis. In other words, if the true 
order is a very minor perturbation of the null hypothesis 
then probability of rejecting the null hypothesis would be 
smaller than when true order is substantially different 
from the null hypothesis. In a future project, we plan to 
investigate this problem in greater detail. 



DISCUSSION 

Since cell cycle genes follow a synchronized pattern of 
expression (25), one may speculate that some of the cell 
cycle genes are functionally conserved through evolution. 
There is an intrinsic order to the peak expression among 
the cell cycle genes so that they are converted into proteins 
in a well-synchronized manner to execute their respective 
functions during cell cycle. Consequently, the relative 
timing of peak expression of some of the cell cycle genes 
must be conserved through evolution. 

Often the order among genes is determined using heat 
maps and published literature. There does not seem to 
exist a formal statistical methodology to test hypothesis 
regarding the order among genes in a given experiment. 
In this article, we have developed a novel statistical meth- 
odology that can be used for testing relative order among 
the phase angles of cell cycle genes. Using the method- 
ology developed in this article, we demonstrated that a 
core subset of 28 5. pombe genes have the same relative 
timing of peak expression as their S. cerevisiae orthologs. 
This number increases to 32 if we reduce the stringency of 
our criterion. Thus, it may be reasonable to infer that 
among the 35 genes in the core set FB, at least 80% 
satisfy the same relative order of peak expression as 
their S. cerevisiae orthologs. Similarly, ~40% of the FH 
core set genes (10 out of 24) satisfy the same relative order 
of peak expression as their H. sapiens orthologs. 

Although this article takes the first step toward a formal 
statistical methodology for answering questions about 
conservation of the relative order cell cycle genes, it is 
important to acknowledge that, analogous to classical 
linear regression analysis, one may consider other alterna- 
tives to FSA and derive an improved algorithm. 

In this article, we took a 'conservative approach 1 when 
formulating the hypotheses appearing in Equations (1) 
and (2). Note that the average cyclebase ranks of 5. 
cerevisiae and H. sapiens orthologs used in this article 
are 126.57 and 121.21, respectively. These are almost 
twice the average cyclebase rank of 5. pombe genes, 
which is 66.17. Since higher cyclebase rank corresponds 
to poorer periodicity, therefore potentially, there is greater 
uncertainty in the phase angles of 5". cerevisiae and H. 
sapiens orthologs in comparison to 5. pombe genes. 
Since we formulated our null hypotheses using 5. 
cerevisiae and H. sapiens orthologs and used S. pombe 
data to test, the FSA is more likely to select fewer genes 
than otherwise. The above formulation resembles the clas- 
sical 'single sample' statistical hypothesis testing problem. 
It would be useful to extend our procedure so that 
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uncertainties in both orthologs are taken into account 
when formulating the testing problem, resembling the 
classical 'two sample' testing problem. Note that the 
'two-sample' problem for testing the equality of two sets 
of orderings is not well developed even for Euclidean 
space data, and the problem is substantially more 
complicated for circular data. 

The proposed relative order for S. cerevisiae and 
H. sapiens were determined using the peak times 
reported in cyclebase and the published literature. We rec- 
ognize that the exact order among some of the 'neighbor- 
ing' genes is difficult to ascertain. Thus, there is a potential 
for misspeciflcation of the relative order. This resembles 
the classical problem of model misspeciflcation that occurs 
so commonly in a variety of situations. If a biologist 
chooses to refine our proposed relative order based on 
her/his understanding of the functions and order of the 
genes, then she/he may explore such alternative orders and 
test them using our proposed methodology. A biologist 
could also select best fitting relative orders using the 
lack of fit criterion introduced in this article. Hence in 
this article we have provided a general methodology that 
would allow biologists to hypothesize a sequential order of 
peak expression for cell cycle genes and test it. 

A freely downloadable SAS based user-friendly 
software can be obtained by either contacting the first 
author or by visiting www.niehs.nih.gov/research/ 
atniehs/labs/bb/staff/peddada/index.cfm. An R package 
containing this and other circular data analysis routines 
is being developed and will soon be made available at the 
above address. 



SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online. 
Supplementary Tables S1-S4. Supplementary Information 
and Supplementary References [26-31]. 
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